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Abstract 


ABC algorithms involve a large number of simulations from the model of interest, 
which can be very computationally costly. This paper summarises the lazy ABC 
algorithm of lH, which reduces the computational demand by abandoning many 
unpromising simulations before completion. By using a random stopping decision 
and reweighting the output sample appropriately, the target distribution is the same 
as for standard ABC. Lazy ABC is also extended here to the case of non-uniform 
ABC kernels, which is shown to simplify the process of tuning the algorithm 
effectively. 

1 Algorithms 

Approximate Bayesian computation (ABC) approximates Bayesian inference on parameters 9 with 
prior tt{9) given data j/obs- It must be possible to simulate data y from the model of interest given 0. 
This implicitly defines a likelihood function L{6): the density of yobs conditional on 9. 

A standard importance sampling version of ABC samples parameters 9i-n from an importance den¬ 
sity g{9) and simulates corresponding datasets yi-.N- Weights wi-n are calculated by equation o 
below. Then for a generic function f{9), an estimate of its posterior expectation E[/(0) jj/obs] is M/ = 

f{9i)wi. An estimate of the normalising constant 7r(yobs) = J TT{9)L{0)d9, 

used in Bayesian model choice, is z = Under the ideal choice of weights, 

Wi = L(^)'K{9i) /g{9i), these estimates converge (almost surely) to the correct quantities as 
A —> oo 0]. In applications where L{9) cannot be evaluated ABC makes inference possible with 
the trade-off that it gives approximate results. That is, the estimators converge to approximations of 
the desired values. 

The ABC importance sampling weights avoid evaluating L{9) by using: 


WABC = -I^ABC7r(6»)/y(6>) 
where Labc = K[d{s{y),s{yohs))/h] 


( 1 ) 

( 2 ) 


Here: 


• ^ABC acts as an estimate (up to proportionality) of L{9). This and wabc are random vari¬ 
ables since they depend on y, a random draw from the model conditional on 9. 

• s(-) maps a dataset to a lower dimensional vector of summary statistics. 

• d{-, ■) maps two summary statistic vectors to a non-negative value. This defines the distance 
between two vectors. 

• K[-], the ABC kernel maps from a non-negative value to another. A typical choice is a 
uniform kernel K\x\ = l(x G [0,1]), which makes an accept/reject decision. Another 
choice is a normal kernel K[x\ = e ^ . 
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• > 0 is a tuning parameter, the bandwidth. It controls how close a match of s{y) and 

s(?/obs) is required to produce a significant weight. 

The interplay between these tuning choices has been the subject of considerable research but is not 
considered further here. For further information on this and all aspects of ABC see the review papers 

m. 

Lazy ABC splits simulation of data into two stages. First the output of some initial simulation stage 
X is simulated conditional on 9, then, sometimes, a full dataset y is simulated conditional on 9 and 
X. The latter is referred to as the continuation simulation stage. The variable x should encapsulate 
all the information which is required to resume the simulation so may be high dimensional. There 
is considerable freedom of what the initial simulation stage is. It may conclude after a prespecified 
set of operations, or after some random event is observed. Another tuning choice is introduced, 
the continuation probability function a{9, x). This outputs a value in [0,1] which is the probability 
of continuing to the continuation simulation stage. The desired behaviour in choosing the initial 
simulation stage and a is that simulating x is computationally cheap but can be used to save time by 
assigning small continuation probabilities to many unpromising simulations. 

Given all the above notation, lazy ABC is Algorithm[T] To avoid division by zero in step 5, it will 
be required that q ;( 0 , x) > 0, although this condition can be weakened lH. 


Algorithm: 

Perform the following steps for z = 1,..., TV: 

1 Simulate 0^ from giff). 

2 Simulate Xi conditional on 9i and set = a{9i,Xi). 

3 With probability continue to step 4. Otherwise perform early stopping: let ti = Q and 
go to step 6. 

4 Simulate yi conditional on 9i and Xi. 

5 Set £2 ; ^(yobs)/^)]/tti" 

6 Set Wi = £i7r(9i)ig(9i). 

Output: 

A set of TV pairs of (9i,Wi) values. 


Algorithm 1: Lazy ABC 

Lazy ABC has the same target as standard ABC importance sampling, in the sense that the Monte 
Carlo estimates pf and z converge to the same values for N ^ 00 . This is proved by Theorem 1 
and related discussion in 10]. A sketch of the argument is as follows. Standard ABC is essentially an 
importance sampling algorithm: each iteration samples a parameter value 9 from g{9) and assigns 
it a random weight w given by O- The randomness is due to the random simulation of data y. The 
expectation of this weight conditional on 9 is 

E[wabc|^'] = E[LABc\0]TT{9)/g{9) 
where expectation is taken over values of y. 

Lazy ABC acts similarly but uses different random weights 

rLABca~^7r(0)/p(0) with probability a = a(0, x) 

= to otherwise 

The randomness here is due to simulation of x and y. Taking expectations gives: 

E[wiazy|0,a:] = E[LABc|6',a;]7r(0)/gf(0) 

^ E[r(;iazy|0] = E[LABc|6']7r(0)/gf(0) 

From the theory of importance sampling algorithms with random weights (see 10]) this ensures that 
both algorithms target the same distribution. 

This argument shows lazy ABC targets the same pf and z quantities as standard ABC, for any 
choice of initial simulation stage and a. However, for poor choices of these tuning decisions it may 
converge very slowly. The next section considers effective tuning. 


2 




2 Lazy ABC tuning 


The quality of lazy ABC tuning can be judged by an appropriate measure of efficiency. Here this 
is defined as effective sample size (ESS) divided by computing time. The ESS for a sample with 
weights wi,..., wn is 


A^eff 


N 


N 


N 


N' 


"E 



It can be shown 10] that for large N the variance of /r/ typically equals that of Aeff independent 
samples. Computing time is taken to be the sum of CPU time for each core used (as the lazy ABC 
iterations can easily be performed in parallel.) 


Theorem 2 of Q] gives the following results on the choice of a which maximises the efficiency of 
lazy ABC in the asymptotic case of large N. Eor now let </> represent {9, x). Then the optimal choice 
of a is of the following form; 


where 


a((/)) = min 



.T2W 





(4) 

(5) 


Here 7 ((/)) is the expectation given (f) of w^bc’ squared weight which would be achieved under 
standard ABC importance sampling; T 2 {(t>) is the expected time for steps 4-6 of Algorithm [T] given 
A > 0 is a tuning parameter that controls the relative importance of maximising ESS (maximised 
by A = c») and minimising computation time (minimised by A = 0). 


A natural approach to tuning a in practice is as follows. The remainder of the section discusses 
these steps in more detail. 


1. Using Algorithm[T] with a = 1 simulate training data Heref|^^ 

is the time to perform steps 1-3 of Algorithm[T]and is the time for steps 4-6. 

2. Estimate 7 (^) and T 2 (</>) from training data. 

3. Choose A to maximise an efficiency estimate based on the training data. 

4. Decide amongst various choices of initial simulation stage (and (j), see below) by maximis¬ 
ing estimated efficiency. By collecting appropriate data for these choices in step 1 it is not 
necessary to repeat it. 


Step 2 is a regression problem, but is not feasible for (p = {6, x) as this will typically be very high 
dimensional. Instead a can be based on low dimensional features of {6, x), referred to as decision 
statistics. That is, only a functions of the form a{(j){9, x)) are considered, where x) outputs a 
vector of decision statistics. The optimal such a is again given by (IHi and Q. The choice of which 
decision statistics to use can be included in step 4 above. 

Estimating 7 ( 0 ) by regression is also challenging if there are regions of p space for which most 
of the responses are zero. This is typically the case for uniform K. In |[ll] various tuning methods 
were proposed for uniform K but these are complicated and rely on strong assumptions. A simpler 
alternative used here is to use a normal K as it has full support. 

Local regression techniques jsl] are suggested for step 2. This is because the behaviour of the re¬ 
sponses typically varies considerably for different p values, motivating fitting separate regressions. 
Eirstly, the typical magnitude of Labc varies over widely different scales. Secondly, for both regres¬ 
sions the distribution of the residuals may also vary with p. To ensure positive predictions, the use 
of degree zero regression is suggested i.e. a Nadaraya-Watson kernel estimator. 

The efficiency estimate required in steps 3 and 4 can be formed from the training data and pro¬ 
posed choice of a. Let {ai)i<i<M be the a values for the training data and {li)i<i<M be the 
values of Tabc- The realised efficiency of the training data is not used since it is based on a small 
sample size. Instead the asymptotic efficiency is estimated. Under weak assumptions (see El) 
this is E(T)“^, where random variable T is the CPU time for a single iteration 

of lazy ABC. Note that E{w) is constant (the ABC approximation for the normalising constant 
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7>'(yobs)) under any tuning choices, so it is omitted. This leaves an estimate up to proportionality 
of [E(r(;^) E(T)]“^ which can be used to calculate efficiency relative to standard ABC (found by 
setting a = 1). An estimate of E(T) is T = Using Q an estimate 

of E{w‘^) is ui2 = Ya=i 

3 Example 

As an example the spatial extremes application of j^l is used. This application and the implemen¬ 
tation of lazy ABC is described in full in lUl]. A short sketch is that the model of interest has two 
parameters (c,;/). Given these, data yt^d can be generated for years 1 < f < 100 and locations 
1 < d < 20. These represent annual extreme measurements e.g. of rainfall or temperature. An ABC 
approach has been proposed including choices of s(-) and d{-, •). Also, given data for a subset of 
locations an estimate of the ABC distance can be formed. 

Simulation of data is hard to interi'upt and later resume. However the most expensive part of the 
process is calculating the summary statistics, which involves calculating certain coefficients for 
every triple of locations. Therefore the initial simulation stage of lazy ABC is to simulate all the data 
and calculate an estimated distance based on a subset of locations L, which is used as the decision 
statistic (j). The continuation stage is to calculate the coefficients for the remaining triples and return 
the realised distance. 

Tuning of lazy ABC was performed as described in Section|2l using backwards selection in step 4 to 
find an appropriate subset of locations to use as L. To fit the regressions estimating and T2{4>) 
a Nadaraya-Watson kernel estimator was used with a Gaussian kernel and bandwidth 0.5, chosen 
manually. 

Repeating the example of lH, 6 simulated data sets were analysed using standard and lazy ABC. 
Each analysis used 10® simulations in total. In lazy ABC M — 10"^ of these were used for training. 
The results are shown in Table [T] The efficiency improvements of lazy ABC relative to standard 
ABC are of similar magnitudes to those in lH but are less close to the values estimated in step 3 of 
tuning. 


Parameters 

C V 

Standard 
Time (10®s) 

Lazy 
Time (10®s) 

ESS 

Relative efficiency 
Estimated Actual 

0.5 

1 

26.7 

8.0 

131.6 

3.9 

2.2 

1 

1 

25.6 

7.1 

174.2 

4.5 

3.1 

1 

3 

25.5 

8.3 

185.3 

3.8 

2.8 

3 

1 

25.6 

7.6 

267.2 

4.2 

4.5 

3 

3 

25.2 

8.2 

193.5 

3.9 

3.0 

5 

3 

25.7 

8.4 

162.4 

3.7 

2.5 


Table 1; Simulation study on spatial extremes. Each row represents the analysis of a simulated 
dataset under the given values of parameters c and i/. In each analysis a choice of e was made under 
standard ABC so that the ESS was 200, and the same value was used for lazy ABC. The lazy ABC 
output sample includes the training data, as described in H. Also its computation time includes the 
time for tuning calculation (roughly 70 seconds). Iterations were run in parallel and computation 
times are summed over all cores used. 


4 Conclusion 

The paper has reviewed lazy ABC 0], a method to speed up ABC without introducing further ap¬ 
proximations to the target distribution. Unlike 0 , non-uniform ABC kernels have been considered. 
This allows a simpler approach to tuning, which provides a comparable three-fold efficiency increase 
in a spatial extremes example. 

Several extensions to lazy ABC are described in 0: multiple stopping decisions, choosing h after 
running the algorithm and a similar scheme for likelihood-based inference. Other potential exten¬ 
sions include using the lazy ABC approach in ABC versions of MCMC or SMC algorithms, or 
focusing on model choice. 
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